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1. INTRODUCTION 

First we should like to thank the editor for al- 
lowing us to respond to interesting discussions from 
the discussants, Molenberghs, Kenward and Verbeke 
(MKV), Louis and Meng, for the effort they have 
put into their replies, and for the many important 
points that they have raised. 

We view statistics as comprising relationships be- 
tween models and data, where a statistical model is 
a formal mathematical formula which in some sense 
represents the patterns in the data. It represents a 
tool underlying the process of "making sense of fig- 
ures." There are two processes linking models and 
data. The first, which we term the forward process, 
can be written as 



model 



data. 



This stands for, "given a model, what would the 
data that it generates look like?" We call this pro- 
cess statistical modelling and it forms the basis of 
probability theory. The second process, which we 
term the backward process, can be written as 

model < — data. 

This stands for, "given data, and a (guessed) model 
what can we say about the parameters in that 
model?" We call this process statistical inference, 
and it is displayed in Efron's (1998) triangular di- 
agram for 21st-century statistical research, involv- 
ing the three schools, Fisherian, Bayesian and Fre- 
quentist. The process of inference involves two pro- 
cedures, namely model fitting and model checking. 
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In the first we find values for the parameters in the 
model that fit the data best, and in the second we 
use probability theory to check whether the fit and, 
therefore, the assumed model is acceptable, by look- 
ing at the distribution of a suitable badness-of-fit 
statistic. Model checking could lead to a new model 
involving the two processes. 

Among the discussants, MKV seem to suggest that 
data contain information only about the parameters 
in the marginal likelihood, but not about the unob- 
servables (random effects) in the h-likelihood. Louis 
and Meng say that extended likelihood such as h- 
likelihood does indeed carry information about the 
unobservables, but that nevertheless the Bayesian 
approach is best suited for such inferences. We hope 
to show how the ideas can be combined together in 
the h-likelihood framework to give a new type of 
statistical inference. We shall try to make clear the 
inferential status of our framework. 

The Bayesian framework is a well-defined mathe- 
matical structure about which theorems can be proved. 
However, it requires a statement of subjective prior 
belief about the unknown parameters which we are 
unable to provide. Of course many attempts have 
been made to define "objective" priors, but we be- 
lieve them not to have been successful. As Barnard 
(1995) used to stress, in scientific inference the aim 
is to look for objective conclusions that scientists 
can agree upon. Senn (2008) puts it more strongly 
when he writes, "In fact the gloomy conclusion to 
which I am drawn on reading de Finetti (1974) is 
that ultimately the Bayesian theory is destructive 
of any form of public statistics." An alternative de- 
scription is that we are looking across data sets 
for significant sameness, structures that remain un- 
changed when external conditions vary, a view which 
has been strongly propagated by Ehrenberg (1975). 
Another problem of using priors on parameters is 
that however many data are collected, no informa- 
tion is added regarding the parameters in the prior. 
In contrast to the many possible priors in Bayesian 
framework, in our system there is only one corre- 
sponding prior likelihood (Pawitan, 2001) for pa- 
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rameters, namely L(0) = 1, and as data grow in- 
formation is accumulated on all parameters in the 
model. Model checking is a vital part of inference 
and we regard accumulated information as neces- 
sary for model checking to be effective. Science is an 
entirely open-ended procedure, and there can be no 
possibility of assigning probabilities to all the mod- 
els we have not thought of yet. 

Louis says that our paper is, "more of an opinion 
piece than a scientific comparison of approaches." In 
fact, we see no fault in presenting an opinion piece; 
indeed, the title itself should prepare the reader for 
what follows. We have indeed carried out extensive 
simulations with a wide variety of data comparing 
our estimates with those of other methods, and so 
far h-likelihood estimates have been often uniformly 
better in terms of mean-square error. Some of these 
comparisons may be found in Lee, Nelder and Paw- 
itan (2006). We think our solution to mending the 
plug-in empirical Bayesian (Carlin and Louis, 1996) 
method is more straightforward than that of Louis 
which requires yet another level of priors. We have 
used the Laplace approximation when explicit ex- 
pression for integration is not available. It seems 
to work well in problems involving unobservables, 
except for a few extreme cases where second-order 
Laplace approximations are required. We support 
the view that, "the art of applied mathematics is to 
know when to approximate." 

Louis seems to regard our paper as over-promotion 
of h-likelihood, but maybe that is because we are 
highly enthusiastic about h-likelihood. In the paper 
we do in fact make scientific comparisons as opposed 
to the typical Baysian, who rarely makes compar- 
isons with other approaches. As we shall discuss in 
Section 4, with a Bayesian model, the h-likelihood 
is equivalent to the Bayes posterior so there is no 
disagreement in the case of a Bayesian model (see 
Bj0rnstad, 1996 for more discussion). We try to give 
a unified framework for statistical methods devel- 
oped by the three schools, but in our approach we re- 
gard a fixed parameter as simply unknown so that it 
differs from efforts of unification from the Bayesian 
side (Box, 1980; Little, 2006). 

We shall now respond to the discussants by topic 
instead of responding to individual contributions. 

2. LP, MODEL CHECKING AND MODEL 
CHOICE 

The likelihood principle (Birnbaum, 1962) pro- 
vides a very good reason for using likelihood for in- 
ferences, but it does not show how this should be 



done. We have shown over the last 15 years how to 
do so. One drawback of LP and likelihood meth- 
ods in general is that they do not tell us what to 
do when the model in use is not right. Suppose 
that we have a random-effect model. If the model is 
right the standard sandwich standard error estima- 
tors can be made from the marginal likelihood. How- 
ever, we can have another sandwich estimator from 
the h-likelihood which is useful when the homogene- 
ity assumption on the variance of random effects is 
violated (Lee, 2002). H-likelihood also shows how to 
reduce the bias of the ML estimators in frailty mod- 
els with nonparametric baseline hazards (Ha, Noh 
and Lee, 2010). 

MKV claim that inferences about unobservables 
cannot be made because they are not identifiable. 
This is not so. Unobservables can change their sta- 
tus to fixed unknowns once the sample has been 
observed, as we shall discuss in Section 6.2. Thus in 
principle unobservables can be estimated and model 
assumptions can be checked after the data have been 
collected. However, unobservables occur in various 
forms, for example, as a random effect or as a miss- 
ing datum. One cannot extract any information about 
the latter from observed data while this can be done 
about the former. Thus for the latter, model check- 
ing based on observed data is not possible. 

However, model checking is very important. We 
have shown that the distributional assumptions on 
the random effect can be checked, and have devel- 
oped various model-checking procedures and criteria 
for model selection. Suppose we have two different 
random-effect models, 

Ui = Xi(3 + WiW + Bi and yi = Xi/3 + Uiu + a, 

which lead to the same marginal model. The two 
models could have different numbers of random ef- 
fects. For each model the assumptions about the 
random effects can be checked by the model-checking 
procedures given in Lee, Nelder and Pawitan (2006). 
If the assumed model is correct we can give esti- 
mates for each random component. However, the in- 
dividual random-effect estimators of the two models 
cannot be matched. Nevertheless, Lee and Nelder 
(2006) show that the two models, if equivalent, give 
the same inferences for equivalent quantities, for ex- 
ample, that WiW = UiU, giving the same predictions 
for the data, yi(w) = Xi(3 + WiW = yi{u) = Xi/3 + UiU. 
Now suppose that they lead to the same marginal 
model for y, but give different predictions for the 
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data, yi{w) ^ Vi{u). Then a model choice can be 
made from the deviances 

D(w) = Y J {vi-yi{w)} 2 

and 

D{u) = Y J {Vi-Hu)}\ 

where D{w) ^ D(u). These deviances are constructed 
from the conditional likelihood fe(y\v) (see Lee and 
Nelder, 1996 and Lee, Nelder and Pawitan, 2006, 
Chapter 6.5). We, like Bayesians (Spigelhalter et 
al., 2002) use the so-called deviance information cri- 
terion (DIC) based on its degrees of freedom (see 
Ha, Lee and MacKenzie, 2007 and Vaida and Blan- 
chard, 2005 for more discussion). MKV seem to re- 
gard two models as equivalent if they lead to the 
same marginal model. How can the two models with 
different predictions be equivalent? Consider a one- 
way random model (Ml) leading to a marginal mul- 
tivariate model with composite symmetric covari- 
ance structure (M2). They cannot be the same model 
because they give different predictors. The former 
(Ml) exploits the covariance structure to give a bet- 
ter prediction. Regardless of how the data are gener- 
ated, if the covariance structure is composite a sym- 
metric one-way random model gives a better pre- 
diction than that based on the common mean of the 
marginal model (M2). The random-effect model is 
in fact an advancement on the marginal model be- 
cause it shows how to predict. Various time series 
models and spatial models have been proposed in 
this respect. 

Suppose that the unobservables v are missing data 
y m i s . Let i)i(R) and yi(NR) be the predicted values 
of the missing data under missing at random (MAR) 
and missing not at random (MNAR), respectively. 
The deviances are then 

D(R) = ^{m-m(R)} 2 

and 

D(NR) = Y,{yi-UNR)} 2 . 

Now suppose that the two missing mechanisms, MAR 
and MNAR, give the same predictions for the ob- 
served data while giving different predictions for the 
missing data. Then we have 

D{R) = A + ^{y mis ,i - y m isA R )f 

and 

d(nr) = a + J2{y™,i - y™sA N R)} 2 > 



where A = J2{yobs,i ~ y bs,i( R )} 2 = J2{yobs,i - 
iiobs,i(N R)} 2 . In this case we cannot make a model 
choice based upon the deviance because both y m j s ,i 
and their predictors are based upon the model as- 
sumptions for the missing data. Even though 
2/mis,i(-R) ¥= ymis,i{ NR ) we cannot observe y miSii to 
evaluate them. In this case, given only the observed 
data, sensitivity analysis can be used to show how 
inferences about y m i s ,i(R) and y m is,i(NR) vary as 
the model for the unobservables changes. However, 
we may never be able to draw any conclusions from 
the analysis because we do not have the data y m j s ,i 
to check our thinking. We are also unable to say 
that we have checked for all the possible ways about 
which our thoughts may be wrong. 

We agree that care is necessary, as MKV say, but 
something can still be done about inferences for ran- 
dom effects of the observed data. 

3. NON PARAMETRIC, SEMI-PARAMETRIC 
MODELS AND GEE 

It is not always easy to check all the assumptions 
of a given model. For example, with binary data it is 
hard to check the distributional assumption about 
the random effects. In semi-parametric frailty mod- 
els with nonparametric baseline hazards we can re- 
lax the specification of probability models on cer- 
tain parts of the model. However, this differs from 
the lack of a probability basis, such as is shown in 
some GEEs. In a given semi-/nonparametric model, 
there are many classes of submodels which belong 
to the model. It is not correct that estimating equa- 
tions such as the quasi-likelihood estimating equa- 
tions (father of GEE) satisfy only the first two mo- 
ment (or minimal) conditions; Lee and Nelder (1999) 
showed that they satisfy all the higher cumulant 
conditions of a GLM family if it exists for the given 
mean and variance relationships. We have demon- 
strated that the estimating equations implicitly im- 
pose assumptions about the higher cumulants, so 
that a choice can be made depending upon the ro- 
bustness of model misspecifications. We are not 
against the use of GEE when it has a proper model 
basis. However, its claimed advantage of less sensi- 
tivity to model assumptions results from not com- 
paring like with like. 

In HGLMs, the distribution of random effects can 
be relaxed to give nonparametric ML estimators 
(Laird, 1978). Parameter estimates from binary 
GLMMs can be sensitive to the distributional as- 
sumptions of the random effects. A solution to this 
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is to allow heavy-tailed distributions for the ran- 
dom effects, to give a robust analysis (Noh, Pawitan 
and Lee, 2005). Thus various parts of the model as- 
sumptions in HGLMs can be relaxed to produce new 
nonparametric or semi-parametric models. Ha, Noh 
and Lee (2010) show that in semi-parametric frailty 
models h-likelihood extends the partial likelihood of 
Cox (1975) to produce new efficient estimating equa- 
tions. 

4. APHLS VERSUS MARGINAL POSTERIORS 

Louis and Meng both say that our adjusted profile 
h-likelihood (APHL) in Section 3.2 is a Laplace ap- 
proximation to their marginal posterior. This is not 
true because it can also eliminate fixed unknowns. 
Consider the h-likelihood, 

h = h(9,v) = log f e (y\v) + log fo(v) = log f e (y,v) 

= log f e (y) + log fg(v\y). 

Bayesian models are composed of two objects, namely 
the data and unobservables; 9 is not fixed unknown, 
but unobservable. Thus their model is 

B = h + logir(9), 

where h = log f(y\v, 9) + log f{y\6). Thus the Bayes- 
ian framework eliminates 6 by integration, even with 
the use of the improper prior tv(0) = 1. 

Suppose that all 8 are indeed unobservables with 
a known distribution n(9). Then the extended LP 
says that B carries all the information about un- 
observables (9,v). In such a case the Bayesian ap- 
proach gives suitable statistical inferences, and the 
Bayesian and h-likelihood inferences are equivalent. 

But, suppose that 6 represents fixed unknown pa- 
rameters, rather than unobservables. As Meng says 
there is no truly noninformative prior, at least for 
continuous parameters. This means that for infer- 
ences about parameters, use of the Fisherian likeli- 
hood would be suitable. To eliminate nuisance pa- 
rameters we could use profiling, conditioning or piv- 
oting methods, as developed by Fisherian and fre- 
quentist schools. Fisher (1934) shows that 

log f e (§\ A) <xm(9)-m(9), 

where m(9) = log fg(y), 9 is the ML estimator and A 
is an ancillary statistic. A wonderful generalization 
of Fisher's work (Barndorff-Nielsen, 1983) gives the 
so-called magical formula, 

log f e (9\A) oc m(9) - m{9) + \D{m, 9), 



where D(m,9) is defined in Section 3.2 of the main 
paper. 

Let 9 = (£, r) with £ being the nuisance parame- 
ters and r being the parameters of interest. Suppose 
that £ and r are orthogonal parameters. Because the 
ML estimator is asymptotically sufficient, using the 
magical formula, we can eliminate the nuisance pa- 
rameter £ from the marginal likelihood, 

log My 1 1, A) =p 5 (m;r), 

where p^(m;T) is defined in Section 3.2 of the main 
paper, giving the Cox-Reid (1987) adjusted profile 
likelihood. This formula happens to be the same as 
the Laplace approximation, integrating out £. But it 
is actually using the Fisherian method of condition- 
ing out the fixed parameters. This adjustment im- 
proves the profiling method (Lee, Nelder and Paw- 
itan, 2006). We note that the elimination of pa- 
rameters and unobservables can be carried out in 
a uniform formula (APHL) which eliminates un- 
observables by integration (as with the Bayesian 
approach) and parameters by conditioning or (ad- 
justed) profiling. Thus our APHL is quite different 
from the Bayesian marginal posterior. We believe 
that the prior on fixed parameters is informative 
if the APHL and marginal posteriors differ. In our 
framework we use profiling, modified (or adjusted) 
profiling, or pivoting, as developed by the likelihood 
school, to eliminate nuisance fixed parameters while 
we use integrating-out techniques, as developed in 
the Bayesian school, to eliminate unobservables. Our 
technique, among other things, extends REML from 
normal models to the GLM class. 

5. HOW TO USE THE H-LIKELIHOOD 

Clearly Meng understands how to form the h- 
likelihood for inferences. Most complaints are caused 
by misunderstandings about how to use the likeli- 
hood for statistical inference. 

5.1 Neyman-Scott Problems and Nuisance 
Parameters 

When the number of nuisance parameters increases 
with the sample size, the ML estimators, jointly 
maximizing nuisance and parameters of interest to- 
gether, can give seriously biased estimates (Neyman 
and Scott, 1948). Similarly joint maximizations of 
the h-likelihood provide seriously biased estimates, 
as Little and Rubin (1983) have shown. However, 
if we maximize an appropriate APHL we can avoid 
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such problems (Lee, Nelder and Pawitan, 2006, Chap- 
ter 4). When the number of nuisance parameters in- 
creases with the sample size, the profile likelihood is 
often not satisfactory, and APHLs have been devel- 
oped for such cases. Yun, Lee and Kenward (2007) 
showed that Little and Rubin's (1983) complaints 
can be resolved by using appropriate APHLs. 

Lee, Nelder and Pawitan (2006) highlight the main 
philosophical difference between completed-data like- 
lihood for the EM method and the h-likelihood. In 
the former missing data are treated as unobserved 
data, while in the latter they are nuisance parame- 
ters so that the technique developed for parameter 
handling can be used for the efficient imputation 
of missing data (Kim, Lee and Oh, 2006). We have 
found that the h-likelihood may give very good im- 
putation even up to 95 percent missingness while the 
EM method suffers from a slow convergence and dis- 
torted results with over 30 percent missingness (Lee 
and Meng, 2005). We believe that our methods will 
lead to great improvements in the handling of miss- 
ing data. 

5.2 Invariance for Parameter Estimation and 
Cauchy-Type Distributions 

MKV complain that the likelihood method gives 
point estimates with undefined variance. Suppose 
that y follows an exponential distribution with the 
log-likelihood m = — log A — y/X. The ML estimate 
for A is X = y. Here the observed Fisher information 
is 1(A) = (-1/A 2 + 2y/A 3 ) = 1/A 2 which gives the 
correct variance estimator for A. However, the ML 
estimate of 1/A is 1/y whose moment does not ex- 
ist so that its variance estimator from the likelihood 
theory could be seemingly meaningless. The ML es- 
timators are invariant with respect to any transfor- 
mation. When the sample sized is fixed, ML estima- 
tors may have scales where their moments do not 
exist. Let 6 = 5/ip in equation (20) of the MKV dis- 
cussion. Then y k is the ML estimator for 6 k with a 
finite variance, for example, when < k < 1/2. The 
concept of variance may not be a useful measure of 
uncertainty in the ML estimator once the data are 
given. Whole likelihood curves or some feature of 
it such as the curvature would be useful (Pawitan, 
2001); see more discussion below. 

5.3 Invariance for the Estimation of 
Unobservables and Exponential Model 

Meng points out that consistency theory may not 
be applicable to estimators for unobservables. It has 



been one of the aims of the development of h-likelihood 
procedure to overcome such difficulties. In our frame- 
work we use the marginal ML estimators for fixed 
parameters whose invariance is well established. To 
maintain such invariance for unobservables we fix 
the scale of unobservables in defining the h-likelihood 
and use the mode for inferences. The sample mean 
cannot maintain the invariance with respect to trans- 
formation of unobservables while the mode does. 
Because we are using the mode to derive point es- 
timates for unobservables, its scale is important in 
defining the h-likelihood if we are to have good infer- 
ential properties. We appreciate that Meng gives a 
good theoretical contribution in deciding that scale. 
Another advantage of the mode estimator over the 
sample mean is that it also allows one to do model 
selection (Lee and Oh, 2010) instead of model av- 
erage. Recently, Ma and Jorgensen (2007) have ar- 
gued against the use of mode estimates for random 
effects and proposed the use of the orthodox best lin- 
ear unbiased predictor (OBLUP) method. However, 
Lee and Ha (2010) show that the h-likelihood mode 
estimation gives both statistically better precision 
and maintains the stated level of coverage probabil- 
ity better than the OBLUP method. 

Consider the exponential model of Meng in Sec- 
tion 7 of his contribution for predicting a future ob- 
servation u = y n +i. In Section 7.2 he uses the right 
h-likelihood h(X,v;y) with v = log u, giving A = y n . 
Note that 

■u(A) = E(u\y) = A, 

which is the so-called best predictor, giving u = 
u{X) = A. Now compute the following information 
matrix as in (4.3): 

/ n + 1 1 

I{\u) = I{\u)\ x=Xu=ii = ^ | 2 

V "I 2 A 2 

Note that var\(u\y) = var A (*u) =E(ii(X) — u) 2 = X 2 
Thus 

-i 



A 



gives the correct estimate of vav\(u\y). From 
/A? ¥ \ 



n n 
P (l + n)A 2 
n n 
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we get the correct estimate of 

/ -n2 / - ^2 (1 +n)\ 2 

var A (u-u) = var A (y n+ i - y n ) = . 

n 

This also gives the first-order approximation to 

CMSE(u) = E{(tt(A) - u){u{\) - u)'\y} 

= var A (u - u\y) 2 = A 2 + (y n - A) 2 . 

The delta-method for v = log u gives what Meng has 
for the variance of v but does not provide correct 
estimates for either var A (t> — v) 2 or var\(v — v\y) 2 . 

We have shown that there exists some analogy 
between inferences for the fixed parameters and for 
unobservables. However, there also exist differences 
between them. For a fixed constant, let say A = 3, 
we have g(\) = g(3) for any function g(-). Such an 
invariance is meaningful for an estimator of an un- 
known constant and can be achievable by consis- 
tency of ML estimators for fixed unknowns. Sup- 
pose that we are estimating E(ii|y) of unobservable 
u. Then, in general 

E(g(u)\y)^g{E(u\y)}, 

with equality holding only if g() is a linear function. 
Lee and Nelder (2005) showed that maintaining in- 
variance of inferences from the extended likelihood 
for trivial re-expressions of the underlying unob- 
servables leads to the definition of the h-likelihood. 
Once the data are observed we treat the unobserv- 
ables such as random effects as fixed unknowns as 
discussedjn the next Section so that we maintain 
g(u) = g(u) for any function g(-). However, the in- 
terpretation of the maximum h-likelihood estimator 
as E(u\y) holds only on a particular scale u. 

5.4 Invariance for Parameterization and ML 
Estimation 

In Bayarri's example, y k is the ML estimator of r 
for r = 9 k . Here the consistency of 9 k fails, so that 
the unbiasedness of 9 k and exact variance estima- 
tor of 9 k would be meaningful properties to achieve. 

Note that E(9 k ) becomes 9 k , and vai(9 k ) = I(9 k )~ l = 
(-d 2 m/dT 2 \ T= r)- 1 = 2k 2 9 2k becomes var(9 k ) as k 
approaches zero. Thus these desirable properties can 
be achievable on a particular scale of 9. The ex- 
istence of such a scale is important for inferences 
about fixed unknowns. For example, if there exists 
an exact confidence interval for a particular scale of 
a fixed parameter [let's say (f — L,f + U)], then it 
allows exact intervals for all parameterizations of r 



((g(f — L),g(r + U)) for any function g(-). A plot of 
the whole likelihood curve is useful to find a proper 
scale of intervals for unobservables too (Lee and Ha, 
2010). 

6. IS THE H-LIKELIHOOD INTERVAL FOR 
UNOBSERVABLES A CREDIBLE OR 
FIDUCIAL OR CONFIDENCE INTERVAL? 

Suppose that we have a model for the three ob- 
jects (y,9,v) where y and v are random variables 
(RVs) and 9 is a fixed unknown parameter. The sta- 
tistical model fe(v)fo(y\v) describes how the RVs y 
and v are generated. 

Once the data are observed as y Qt where the sub- 
script o stands for "observed," y Q are fixed knowns, 
no longer RVs. In the Bayesian framework all un- 
knowns are considered as random without allow- 
ing fixed unknowns. Bayesians assume a prior for 
9, making them random variables (RVs) so that the 
marginal posteriors 7r(#i|y ), ir(vi\y ), . . . etc. can be 
obtained by integrating out the rest of unknowns, 
regardless of whether these are fixed unknowns or 
unobservables. We find it mysterious how the fixed 
unknown parameters 9 can change their status to 
RVs, leading to a prior probability ir(9) that gener- 
ates 9. 

6.1 Intervals for Parameters 

Based on the likelihood L yo (9) = fo(y ), frequen- 
tists can derive confidence intervals for 9 (random 
interval for fixed unknown), and this has become 
a standard procedure. However, their argument is 
based on repeated sampling from the same popu- 
lation (RSSP) to which Fisher objected strongly. 
According to Fisher, when a scientist seriously "re- 
peats" an experiment he always has in mind at least 
the possibility that the population of the previous 
experiment may turn out not to be the same popula- 
tion from which he is currently sampling. If he really 
knew the population of the new experiment was ex- 
actly the same as before he would think of himself ei- 
ther as enlarging his original experiment or wasting 
his time. So Fisher believed that the repeated sam- 
pling from different populations (RSDP) was cru- 
cial in making intervals for unknowns. The Bayesian 
credible interval, based upon ir(vi\y ), meets this 
goal because inferences are confined to the experi- 
ment of the observed data y Q . Fisher tried to make 
an interval for 9, the so-called fiducial interval, un- 
der RSDP. But this RSDP assumption is too strong 
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a requirement for his fiducial interval to be applica- 
ble in general (Barnard, 1995) while the confidence 
interval can be made in various contexts. 

6.2 Intervals for Unobservables 

The EB interval based on f(v\\y ,9) has different 
treatments for fixed unknowns 9 and unobservables 
v. However, it cannot account for the information 
loss caused by estimating 9, leading to very liberal 
intervals. The Bayesian credible interval based on 
ir(vi\y ), while improving the EB interval a lot, can 
still exhibit strange behavior as shown in Figure 4 
of the main paper. Louis has suggested some other 
priors to try. However, we object to putting priors 
for fixed unknowns and recommend using likelihood 
methods for inferences about them. 

The observed data can be obtained as follows: 
From the statistical model fg(v) the unobservables 
are realized (generated) as v r where the subscript 
r stands for "realized." Note the use of the term 
"realized" instead of "observed" to emphasize that 
they are fixed but still unknown. Then, the observed 
data y are obtained from the model fe(y\v r ). Now 
suppose that we want to make an interval for the 
fixed unknowns v r given y Q . 

For linear mixed models, Henderson (1975) shows 
that the standard error estimate from the Hessian 
matrix I((3,v) in (4.3) of the main paper gives an 
estimate of the unconditional MSE 

E{{v(9)-v)(v(9)-v)'}. 

In 1996, we showed that this holds more generally 
in HGLMs. This means that our proposed interval 
can be viewed confidence interval (random in- 
terval for fixed unknown) for RSSP whose proba- 
bility statement is for unobserved future data. Sim- 
ulation results in Section 4.3 are from RSSP. The 
proposed 95 percent interval may not always cover 
v r , but among 100 intervals, 95 of them are expected 
to cover the realized value. 

It was Booth and Hobert (1998) who showed that 
I(/3,v) can also give an estimate of CMSE(t>) = 
E{(v(9) - v)(v(9) - v)'\y } for GLMMs with inde- 
pendent random effects. This result can be extended 
to nonnormal random-effect models (Lee and Ha, 
2010) and the correlated random-effect models in 
Section 4.3.1 (Lee, Jang and Lee, 2010). Given the 
observed data, we can therefore make an interval 
(fixed interval for fixed unknown) whose probability 
statement is for all possible future realizations of v. 
Fisher's aim of making intervals for RSDP may be 



generally achievable for realized values of unobserv- 
ables. 

Louis says that we cannot account for the informa- 
tion loss caused by estimating the dispersion param- 
eters while Bayesian marginalization can do so. In 
HGLMs dispersion parameters are orthogonal to the 
rest of parameters so that we do not actually need 
to account for the information loss caused by esti- 
mating them. In general, exactly the same method, 
that is, 1(9, v), is used to account for estimating 
all the parameters. The resulting interval is also 
identical to Kass and Steffey's (1989) approximate 
Bayesian credible interval (fixed interval for random 
unknown) with tt(6) = 1 (Lee, Jang and Lee, 2010). 

The probability statement of Bayesian interval 
based on Tt(v\\y ) is different from the previous two 
intervals for fixed unknown. Probability statement 
of Bayesian credible interval is for RV, that is, the 95 
percent Bayesian interval contains the unobservable 
with 95 percent probability given the data. Such a 
probability statement may not be relevant to the 
realized values of unobservables, but it is meaning- 
ful for inferences about unrealized unobservables, for 
example, inferences about future unobserved obser- 
vations. Our proposed interval also allows such a 
statement for future observations without requiring 
priors on 9. 

Our interval for unobservables could be a fiducial 
(fixed interval for fixed unknown), frequentist (ran- 
dom interval for fixed unknown) or Bayesian inter- 
val (fixed interval for random unknown), so allowing 
three different interpretations. Similarly, the APHL 
can be interpreted either as an approximate condi- 
tional likelihood eliminating nuisance fixed param- 
eters by the magical formula or as an approximate 
marginal posterior using the Laplace approximation 
with 7r(0) = 1. 

Louis says for nonstandard problems where the 
purpose of analysis is to find the shape of the dis- 
tribution for the unobservables (ir(vi \y )), Bayesians 
can offer a better algorithm using the MCMC method 
For such problems we agree that MCMC-type meth- 
ods are useful, but the question is whether we can 
do this without a subjective prior. Can we find a 
solution on which all three schools can agree? Why 
not consider MCMC applied to h-likelihood? 

6.3 Asymptotic versus Finite Sample Properties 
of ML Estimators 

Justification of ML inferences has relied heavily 
on asymptotic theory. On the other hand, Bayesian 
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inferences are exact in finite samples, but require 
the B-club fee, priors for fixed unknowns. However, 
agreed or agreeable priors may hardly exist. We have 
illustrated that an exact finite-sample solution for 
the ML method is possible by finding a proper scale, 
but that search for an exact scale cannot be an easy 
task. However, a practically satisfactory scale can 
be often found without much difficulty and simula- 
tion results in Section 4.3 of the main paper show 
that likelihood inference using such an approximate 
scale can give a better finite-sampling property than 
putting unjustifiable priors for fixed unknowns. 
Therefore, there is a way of deriving finite-sample 
likelihood inferences without paying the B-club fee. 

7. THREE IN ONE 

Likelihood inferences are for models with two ob- 
jects, namely, the data and fixed unknowns (pa- 
rameters), while the probability-based inferences of 
Bayesians are for models with data and random un- 
knowns (unobservables). Statistical models of recent 
interest often have both parameters and unobserv- 
ables. So it seems beneficial to combine the inference 
methods developed by the three schools. We are glad 
that Meng found pivoting to be easy and useful for 
eliminating 9 for his Bayesian inferences. Frequen- 
tists use probability statements to evaluate the per- 
formance of statistical methods. They use unobserv- 
ables (unobserved future data obtained by RSSP) to 
invoke probability statements. By allowing all three 
objects in statistical models and inferences, we hope 
to accommodate the advantages of all three schools 
in a unified framework. 

In discussing Lee and Nelder (1996), Smith won- 
dered whether it was time to bring the "two cul- 
tures" together. In Bayes's original paper (Bayes, 
1763) an example is given of balls rolled on a ta- 
ble; this seems to us to be naturally expressible as 
a two-stage likelihood problem, leading to the ques- 
tion, "Was Bayes a Bayesian in the modern sense?" 
We are very aware that the model class we con- 
sider, though having a wide scope, is as yet incom- 
plete, and also that there are aspects of the the- 
ory, in particular on the choice of scale for unob- 
servables in the definition of the h-likelihood and 
on the choice of scale for exact finite-sample like- 
lihood inferences, which are as yet incomplete. We 
are, however, particularly excited by the contribu- 
tion from Meng which seeks to connect our proce- 
dures to other generally accepted statistical ideas. 
We hope that the time has indeed come to bring 
the three cultures together! 
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